Equal variance: samples are from populations with similar degree of variability
Graphical tests: boxplots
“Formal” tests: F-ratio test
Parametric tests most robust to violations of normality and equal var. assumptions when samples sizes equal
Assumptions of parametric tests
Normality, equal variance, random sampling, no outliers
Random sampling: samples are randomly collected from populations; part of experimental design
Necessary for sample -> population inference
<>
Assumptions of parametric tests
Normality, equal variance, random sampling, no outliers
No outliers: no “extreme” values that are very different from rest of sample
Graphical tests: boxplots, histograms
“Formal tests”: Grubb’s test
Note: outliers also problem for non-parametric tests
<>
Homework take-up
Perform 2-sample t-test:
pop A: 5.3, 5.6, 4.3, 4.9, 5.3, 4.1, 5.2, 5.0 cm
pop B: 6.1, 4.7, 5.9, 4.7, 6.2, 6.0, 5.4, 4.9 cm
ȳpopA - ȳpopB = -0.53
sȳopA-ȳopB = 0.29
t = -1.80
df = 14
p (estimated from t-table) = 0.05 < p < 0.1
Writeup: “a 2-tailed, independent 2-sampe t-test showed no significant difference bw beak length of pop A (4.96 cm ± 0.52 SD) and pop B (5.49 ± 0.64) at á=0.05: t(14) =-1.80, p = 0.094”
Homework take-up
Look through the ecological literature and find an example of a published manuscript that uses either a t-test of one of the tests mentioned in Q2. Provide the following information:
Reference for paper
Scientific question being addressed
Specific hypothesis tested (in mathematical notation)
The results of the t test (t, df, p) and the author’s conclusions
Statistical vs. biological significance
Statistical significance: difference unlikely due to chance
Says nothing about biological significance of difference!
With large sample size can detect very small differences between populations
E.g.: consider 2 snail populations, A and B: Ho: µ~size A~ = µ~size B~; Ha: µ~size A~ ≠ µ~size B~
Statistical vs. biological significance
Size of A: 5.05 (± 2.00 SD)mm, size of B: 5.00 (± 2.00 SD)mm
Sample 50, 200, 30,000 individuals from each pop:
n = 50: t = 0.32, df = 98, p-value = 0.75
n = 200: t = 0.058, df = 398, p-value = 0.95
n = 30,000: t = -4.47, df = 59998, p-value = 7.996*10-6
Statistical vs. biological significance
Finally, statistically significant difference…
Meaningful? Ecologically significant? Statistics can’t answer this question
IMPORTANT to report info that can assess biological significance
“A two-tailed, two-sample independent t-test showed significant difference in size between pop. A (4.99 mm ± 1.99 SD) and pop. B (5.06 mm ± 1.99 SD) at á=0.05 (t = -4.47, df = 59998, p-value < 0.0001).”
Assumptions of parametric tests
Basic assumptions of parametric t-tests:
Normality, equal variance, random sampling, no outliers
What to do if assumptions are violated?
Homework take-up
t-tests have several assumptions. Alternative tests, with more relaxed assumptions, are available to statisticians. In which case would you use the following tests?
Welch’s t-test: when distribution normal but variance unequal
Permutation test for two samples: when distribution not normal (but both groups should still have similar distributions and ~equal variance)
Mann-Whitney-Wilcoxon test: when distribution not normal and/or outliers are present (but both groups should still have similar distributions and ~equal variance)
Assumptions of parametric tests
QQ-plots: tool for assessing normality
On x- theoretical quantiles from SND
On y- ordered sample values
Deviation from normal can be detected as deviation from straight line
Assumptions of parametric tests
In some cases, data can be mathematically “transformed” to meet assumptions of parametric tests
Robust tests
Welch’s t-test: common “robust” test for means of two populations
Robust to violation of equal variance assumption, deals better with unequal sample size
Parametric test (assumes normal distribution)
Calculates a t statistic but recalculates df based on samples sizes and s
Robust tests
In R: t.test(y1, y2, var.equal = FALSE, paired = FALSE) will use the Welch approach
Rank based tests
Rank-based tests: no assumptions about distribution (non-parametric)
Ranks of data: observations assigned ranks, sums (and signs for paired tests) of ranks for groups comparted
Mann-Whitney U test common alternative to independent samples t-test
Wilcoxon signed-rank test is alternative to paired t-test
Rank based tests
Assumptions: similar distributions for groups, equal variance
Less power than parametric tests
Best when normality assumption can not be met by transformation (weird distribution) or large outliers
Permutation tests
Permutation tests based on resampling: reshuffling of original data
Resampling allows parameter estimation when distribution unknown, including SEs and CIs of statistics (means, medians)
Common approach is bootstrap: resample sample with replacement many times, recalculate sample stats
Permutation tests
Sample A: n = 40, ȳ= 1.72, s = 4.17
Sample B: n = 35, ȳ= 4.50, s = 4.83
Ho: µA = µB, Ha: µA ≠µB
Calculate ∆ in means bw two groups (2.78)
Permutation tests
Randomly reshuffle observations bw groups (keeping nA=40 and nB=35), calculate ∆
Repeat >1,000 times
Record proportion of the ∆means is ≥2.94 µmol
This is equivalent to p-value and can be used in “traditional” H test framework
For a graphical explanation: https://www.jwilber.me/permutationtest/
Permutation tests
In R (using ‘perm’ package):
permTS(y1, y2, alternative = “two.sided”, method = “exact.mc”, control = permControl(nmc = 10000))
Assumptions: both groups have similar distribution; equal variance
R practice
Get practice doing basic t-tests
Alternatives in next lecture
Dataset (squirrel_data.csv) and lab instructions on Canvas